278

provides a wealth of information. It is important to remember that these are only func­

tional annotations of elements. Some of the elements have only weak or no selection pres­

sure. For a comparison between vertebrates including humans, the UCSC genome browser

is recommended (https://genome.ucsc.edu), which meanwhile compares a whole zoo of

different genomes with each other (https://genome-­euro.ucsc.edu/cgi-­bin/hgGateway), but

also includes information e.g. from the ENCODE project, such as methylation data, or

predictions by RepeatMasker, such as LINE.

How Can I Create a Phylogenetic Family Tree?

Phylogenetic trees provide an overview of functional and evolutionary relationships. A

number of software options have been described in the book for this purpose. It is impor­

tant that even a simple program like CLUSTAL (https://www.ebi.ac.uk/Tools/msa/clust­

alo/ [newest version: CLUSTAL omega]; https://www.genome.jp/tools/clustalw/

[somewhat older version, aligns pairwise sequences over their whole length quite fast and

draws a phylogenetic tree]) with experience brings better results (with CLUSTAL it is

important to take sequences of approximately the same length; in addition, depending on

the presumed evolutionary distance, one can correct with matrices here). The more com­

plex softwares are correspondingly more complex to use. An example for accurate phylo­

genetic tree analysis is the PHYLogeny Inference Package (PHYLIP; https://evolution.

genetics.washington.edu/phylip.html), which allows the construction of phylogenetic

trees from sequences based on various methods, such as parsimony, likelihood, and boot­

strapping (see the website for detailed documentation). Another option is the software

MUSCLE (Multiple Sequence Comparison by Log-Expectation; https://www.drive5.com/

muscle/), which, in addition to multiple alignment, computes a phylogenetic tree based,

for example, on the methods UPGMA (Unweighted Pair Group Method with Arithmetic

Mean; fast method if there are many sequences) or Neighbor joining (better approximation

to the true tree, but slow if there are too many sequences). The results from MUSCLE can

also be saved in a format compatible with PHYLIP (Newick) and used there. Detailed

documentation on MUSCLE can be found on MUSCLE (https://www.drive5.com/muscle/

manual/) or on the EBI website (https://www.ebi.ac.uk/Tools/msa/muscle/help/).

19.2

RNA: Sequence, Structure Analysis and Control

of Gene Expression

How Do I Find and Analyze an RNA Sequence and Structure?

During transcription, an RNA is produced that has a secondary structure. One important

database is Rfam. It is easy to look up and use and gives an overview of different RNA

families including sequence and structure. There are different functional RNA classes,

such as miRNAs and lncRNAs, which have an impact on gene expression. Important data­

bases include miRBase (https://www.mirbase.org/) and LNCipedia (https://www.lncipe­

dia.org/), which provide specific information on sequence, structure and functional

19  Tutorial: An Overview of Important Databases and Programs